Efficient SVDD sampling with approximation guarantees for the decision boundary

نویسندگان

چکیده

Abstract Support Vector Data Description (SVDD) is a popular one-class classifier for anomaly and novelty detection. But despite its effectiveness, SVDD does not scale well with data size. To avoid prohibitive training times, sampling methods select small subsets of the on which trains decision boundary hopefully equivalent to one obtained full set. According literature, good sample should therefore contain so-called observations that would as support vectors However, non-boundary also are essential fragment contiguous inlier regions poor classification accuracy. Other aspects, such selecting sufficiently representative sample, important well. existing largely overlook them, resulting in In this article, we study how considering these points. Our approach frame an optimization problem, where constraints guarantee indeed approximates original boundary. We then propose RAPID, efficient algorithm solve problem. RAPID require any tuning parameters, easy implement scales large sets. evaluate our real-world synthetic data. evaluation most comprehensive so far. results show outperforms competitors accuracy, size, runtime.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Boundary Tracking Through Sampling

The proposed algorithm for image segmentation is inspired by an algorithm for autonomous environmental boundary tracking. The algorithm relies on a tracker that traverses a boundary between regions in a sinusoidal-like path. Page’s cumulative sum (CUSUM) procedure and other methods are adapted to handle a high level of noise. Applications to large data sets such as hyperspectral, are of particu...

متن کامل

Rapid Sampling for Visualizations with Ordering Guarantees

Visualizations are frequently used as a means to understand trends and gather insights from datasets, but often take a long time to generate. In this paper, we focus on the problem of rapidly generating approximate visualizations while preserving crucial visual properties of interest to analysts. Our primary focus will be on sampling algorithms that preserve the visual property of ordering; our...

متن کامل

Reeb Space Approximation with Guarantees

The Reeb space, which generalizes the notion of a Reeb graph, is one of the few tools in topological data analysis and visualization suitable for the study of multivariate scientific datasets. First introduced by Edelsbrunner et al. [3], the Reeb space of a multivariate mapping f : X→ R parameterizes the set of components of preimages of points in R. In this paper, we formally prove the converg...

متن کامل

Image Segmentation Through Efficient Boundary Sampling

This paper presents a combined geometric and statistical sampling algorithm for image segmentation inspired by a recently proposed algorithm for environmental sampling using autonomous robots [1].

متن کامل

Mixed Bregman Clustering with Approximation Guarantees

Two recent breakthroughs have dramatically improved the scope and performance of k-means clustering: squared Euclidean seeding for the initialization step, and Bregman clustering for the iterative step. In this paper, we first unite the two frameworks by generalizing the former improvement to Bregman seeding — a biased randomized seeding technique using Bregman divergences — while generalizing ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Machine Learning

سال: 2022

ISSN: ['0885-6125', '1573-0565']

DOI: https://doi.org/10.1007/s10994-022-06149-0